Goto

Collaborating Authors

 certain answer


A Unifying Framework for Incompleteness, Inconsistency, and Uncertainty in Databases

Communications of the ACM

Databases are often assumed to have definite content. The reality, though, is the database at hand may be deficient due to missing, invalid, or uncertain information. As a simple illustration, the primary address of a person may be missing, or it may conflict with another primary address, or it may be improbable given the presence of nearby businesses. A common practice to address this challenge is to rectify the database by fixing the gaps, as done in data imputation, entity resolution, and data cleaning. The process of rectifying the database, however, may involve arbitrary choices due to computational limitations, such as errors in statistical or machine-learning models, or mere lack of information that even humans cannot cope with in full confidence. In turn, answers to queries over the deficient database may depend on the choices made to rectify it; thus, the answers to queries may vary from one choice to choice, even though both choices may be equally legitimate. In the pursuit of principled solutions, there has been a continuous research effort to develop fundamental approaches for handling database deficiency with no (or with less) arbitrariness. The purpose of this review article is to highlight some of the ways in which the possible world semantics has been deployed as a principled approach to overcome database deficiency in different contexts. In this approach, we acknowledge that we need to rectify the deficiency: fill in missing information, delete wrong records (hereafter tuples or facts), correct erroneous values, and so on. Yet, since many rectifications may exist and since we do not know which is the correct one, we do not commit to a specific one. Instead, we view our deficient database as a representation of the results of all conceivable rectifications, each such rectification giving rise to a legitimate candidate of a valid database that we call a possible world. Since the possible worlds differ from each other, a query may produce different collections of answers (which are also tuples) when applied to different possible worlds. Therefore, query answering requires the use of an aggregation method to combine the query results over the possible worlds.


A Unifying Framework for Incompleteness, Inconsistency, and Uncertainty in Databases

Communications of the ACM

Databases are often assumed to have definite content. The reality, though, is that the database at hand may be deficient due to missing, invalid, or uncertain information. As a simple illustration, the primary address of a person may be missing, or it may conflict with another primary address, or it may be improbable given the presence of nearby businesses. A common practice to address this challenge is to rectify the database by fixing the gaps, as done in data imputation, entity resolution, and data cleaning. The process of rectifying the database, however, may involve arbitrary choices due to computational limitations, such as errors in statistical or machine-learning models, or mere lack of information that even humans cannot cope with in full confidence.


Deciding FO-rewritability of Regular Languages and Ontology-Mediated Queries in Linear Temporal Logic

Journal of Artificial Intelligence Research

Our concern is the problem of determining the data complexity of answering an ontology-mediated query (OMQ) formulated in linear temporal logic LTL over (Z,<) and deciding whether it is rewritable to an FO(<)-query, possibly with some extra predicates. First, we observe that, in line with the circuit complexity and FO-definability of regular languages, OMQ answering in AC0, ACC0 and NC1 coincides with FO(<,≡)-rewritability using unary predicates x ≡ 0 (mod n), FO(<,MOD)-rewritability, and FO(RPR)-rewritability using relational primitive recursion, respectively. We prove that, similarly to known PSᴘᴀᴄᴇ-completeness of recognising FO(<)-definability of regular languages, deciding FO(<,≡)- and FO(<,MOD)-definability is also PSᴘᴀᴄᴇ-complete (unless ACC0 = NC1). We then use this result to show that deciding FO(<)-, FO(<,≡)- and FO(<,MOD)-rewritability of LTL OMQs is ExᴘSᴘᴀᴄᴇ-complete, and that these problems become PSᴘᴀᴄᴇ-complete for OMQs with a linear Horn ontology and an atomic query, and also a positive query in the cases of FO(<)- and FO(<,≡)-rewritability. Further, we consider FO(<)-rewritability of OMQs with a binary-clause ontology and identify OMQ classes, for which deciding it is PSᴘᴀᴄᴇ-, Π2p- and coNP-complete.


Libkin

AAAI Conferences

The standard way of answering queries over incomplete databases is to compute certain answers, defined as the intersection of query answers on all complete databases that the incomplete database represents. But is this universally accepted definition correct? We argue that this "one-size-fits-all" definition can often lead to counterintuitive or just plain wrong results, and propose an alternative framework for defining certain answers. The idea of the framework is to move away from the standard, in the database literature, assumption that query results be given in the form of a database object, and to allow instead two alternative representations of answers: as objects defining all other answers, or as knowledge we can deduce with certainty about all such answers. We show that the latter is often easier to achieve than the former, that in general certain answers need not be defined as intersection, and may well contain missing information in them. We also show that with a proper choice of semantics, we can often reduce computing certain answers - as either objects or knowledge - to standard query evaluation. We describe the framework in the most general way, applicable to a variety of data models, and test it on three concrete relational semantics of incompleteness: open, closed, and weak closed world.


Libkin

AAAI Conferences

The standard way of answering queries over incomplete databases is to compute certain answers, defined as the intersection of query answers on all complete databases that the incomplete database represents. But is this universally accepted definition correct? We argue that this one-size-fits-all'' definition can often lead to counterintuitive or just plain wrong results, and propose an alternative framework for defining certain answers. We combine three previously used approaches, based on the semantics and representation systems, on ordering incomplete databases in terms of their informativeness, and on viewing databases as knowledge expressed in a logical language, to come up with a well justified and principled notion of certain answers. Using it, we show that for queries satisfying some natural conditions (like not losing information if a more informative input is given), computing certain answers is surprisingly easy, and avoids the complexity issues that have been associated with the classical definition.


Computing Rule-Based Explanations of Machine Learning Classifiers using Knowledge Graphs

arXiv.org Artificial Intelligence

The use of symbolic knowledge representation and reasoning as a way to resolve the lack of transparency of machine learning classifiers is a research area that lately attracts many researchers. In this work, we use knowledge graphs as the underlying framework providing the terminology for representing explanations for the operation of a machine learning classifier. In particular, given a description of the application domain of the classifier in the form of a knowledge graph, we introduce a novel method for extracting and representing black-box explanations of its operation, in the form of first-order logic rules expressed in the terminology of the knowledge graph.


QDEF and Its Approximations in OBDM

arXiv.org Artificial Intelligence

Given an input dataset (i.e., a set of tuples), query definability in Ontology-based Data Management (OBDM) amounts to find a query over the ontology whose certain answers coincide with the tuples in the given dataset. We refer to such a query as a characterization of the dataset with respect to the OBDM system. Our first contribution is to propose approximations of perfect characterizations in terms of recall (complete characterizations) and precision (sound characterizations). A second contribution is to present a thorough complexity analysis of three computational problems, namely verification (check whether a given query is a perfect, or an approximated characterization of a given dataset), existence (check whether a perfect, or a best approximated characterization of a given dataset exists), and computation (compute a perfect, or best approximated characterization of a given dataset).


Polynomial Rewritings from Expressive Description Logics with Closed Predicates to Variants of Datalog

arXiv.org Artificial Intelligence

In many scenarios, complete and incomplete information coexist. For this reason, the knowledge representation and database communities have long shown interest in simultaneously supporting the closed- and the open-world views when reasoning about logic theories. Here we consider the setting of querying possibly incomplete data using logic theories, formalized as the evaluation of an ontology-mediated query (OMQ) that pairs a query with a theory, sometimes called an ontology, expressing background knowledge. This can be further enriched by specifying a set of closed predicates from the theory that are to be interpreted under the closed-world assumption, while the rest are interpreted with the open-world view. In this way we can retrieve more precise answers to queries by leveraging the partial completeness of the data. The central goal of this paper is to understand the relative expressiveness of OMQ languages in which the ontology is written in the expressive Description Logic (DL) ALCHOI and includes a set of closed predicates. We consider a restricted class of conjunctive queries. Our main result is to show that every query in this non-monotonic query language can be translated in polynomial time into Datalog with negation under the stable model semantics. To overcome the challenge that Datalog has no direct means to express the existential quantification present in ALCHOI, we define a two-player game that characterizes the satisfaction of the ontology, and design a Datalog query that can decide the existence of a winning strategy for the game. If there are no closed predicates, that is in the case of querying a plain ALCHOI knowledge base, our translation yields a positive disjunctive Datalog program of polynomial size. To the best of our knowledge, unlike previous translations for related fragments with expressive (non-Horn) DLs, these are the first polynomial time translations.


Certain Answers to a SPARQL Query over a Knowledge Base (extended version)

arXiv.org Artificial Intelligence

Ontology-Mediated Query Answering (OMQA) is a well-established framework to answer queries over an RDFS or OWL Knowledge Base (KB). OMQA was originally designed for unions of conjunctive queries (UCQs), and based on certain answers. More recently, OMQA has been extended to SPARQL queries, but to our knowledge, none of the efforts made in this direction (either in the literature, or the so-called SPARQL entailment regimes) is able to capture both certain answers for UCQs and the standard interpretation of SPARQL over a plain graph. We formalize these as requirements to be met by any semantics aiming at conciliating certain answers and SPARQL answers, and define three additional requirements, which generalize to KBs some basic properties of SPARQL answers. Then we show that a semantics can be defined that satisfies all requirements for SPARQL queries with SELECT, UNION, and OPTIONAL, and for DLs with the canonical model property. We also investigate combined complexity for query answering under such a semantics over DL-Lite R KBs. In particular, we show for different fragments of SPARQL that known upper-bounds for query answering over a plain graph are matched.


Data Complexity and Rewritability of Ontology-Mediated Queries in Metric Temporal Logic under the Event-Based Semantics (Full Version)

arXiv.org Artificial Intelligence

We investigate the data complexity of answering queries mediated by metric temporal logic ontologies under the event-based semantics assuming that data instances are finite timed words timestamped with binary fractions. We identify classes of ontology-mediated queries answering which can be done in AC0, NC1, L, NL, P, and coNP for data complexity, provide their rewritings to first-order logic and its extensions with primitive recursion, transitive closure or datalog, and establish lower complexity bounds.